منابع مشابه
Dual Temporal Difference Learning
Recently, researchers have investigated novel dual representations as a basis for dynamic programming and reinforcement learning algorithms. Although the convergence properties of classical dynamic programming algorithms have been established for dual representations, temporal difference learning algorithms have not yet been analyzed. In this paper, we study the convergence properties of tempor...
متن کاملPreconditioned Temporal Difference Learning
LSTD is numerically instable for some ergodic Markov chains with preferred visits among some states over the remaining ones. Because the matrix that LSTD accumulates has large condition numbers. In this paper, we propose a variant of temporal difference learning with high data efficiency. A class of preconditioned temporal difference learning algorithms are also proposed to speed up the new met...
متن کاملEmphatic Temporal-Difference Learning
Emphatic algorithms are temporal-difference learning algorithms that change their effective state distribution by selectively emphasizing and de-emphasizing their updates on different time steps. Recent works by Sutton, Mahmood and White (2015), and Yu (2015) show that by varying the emphasis in a particular way, these algorithms become stable and convergent under off-policy training with linea...
متن کاملNatural Temporal Difference Learning
In this paper we investigate the application of natural gradient descent to Bellman error based reinforcement learning algorithms. This combination is interesting because natural gradient descent is invariant to the parameterization of the value function. This invariance property means that natural gradient descent adapts its update directions to correct for poorly conditioned representations. ...
متن کاملQuasi Newton Temporal Difference Learning
Fast convergent and computationally inexpensive policy evaluation is an essential part of reinforcement learning algorithms based on policy iteration. Algorithms such as LSTD, LSPE, FPKF and NTD, have faster convergence rates but they are computationally slow. On the other hand, there are algorithms that are computationally fast but with slower convergence rate, among them are TD, RG, GTD2 and ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Scholarpedia
سال: 2007
ISSN: 1941-6016
DOI: 10.4249/scholarpedia.1604